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Abstract 

Eukaryotic cells carry two genomes, nuclear (nDNA) and mitochondrial 
(mtDIMA), which are ostensibly decoupled in their replication, segregation and 
inheritance. It is increasingly appreciated that heteroplasmy, the occurrence of 
multiple mtDIMA haplotypes in a cell, plays an important biological role, but its 
features are not well understood. Until now, accurately determining the diversity 
of mtDNA has been difficult due to the relatively small amount of mtDNA in 
each cell (< 1% of the total DNA), the intercellular variability of mtDNA content 
and copies of mtDNA pseudogenes in nDNA. To understand the nature of 
heteroplasmy, we developed Mseek, a novel technique that purifies and sequences 
mtDNA. Mseek yields high purity (> 98%) mtDNA and its ability to detect rare 
variants is limited only by sequencing depth, providing unprecedented sensitivity 
and specificity. Using Mseek, we confirmed the ubiquity of heteroplasmy by 
analyzing mtDNA from a diverse set of cell lines and human samples. Applying 
Mseek to colonies derived from single cells, we find heteroplasmy is stably 
maintained in individual daughter cells over multiple cell divisions. Our 
simulations suggest the stability of heteroplasmy is facilitated by the exchange of 
mtDNA between cells. We also explicitly demonstrate this exchange by 
co-culturing cell lines with distinct mtDNA haplotypes. Our results shed new light 
on the maintenance of heteroplasmy and provide a novel platform to investigate 
various features of heteroplasmy in normal and diseased tissues. 
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Introduction 

Mitochondria are organelles present in almost every eukaryotic cell [1] . They enable 
aerobic respiration [2] to efficiently generate ATP, and play an important role in oxy- 
gen sensing, inflammation, autophagy, and apoptosis[3, 4]. Mitochondrial activity 
relies on over a thousand proteins, mostly coded by the nuclear DNA in humans[5], 
but genes from the mitochondrial genome, a small circular DNA (mtDNA), play 
a critical role in their function. In humans, the mtDNA is « 17 kbp and codes thir- 
teen proteins critical for the electron transport chain, along with twenty-two tRNAs, 
two rRNAs and a control region, called the displacement loop (D-loop) (Fig. SI) [6]. 
Their genetic code differs from the nuclear code. In mammalian mitochondria, ATA 
codes for Methionine instead of Isoleucine, TGA codes for Tryptophan instead of 
the stop codon, and AG A, AGG code for stop codons instead of Arginine hinting 
at a bacterial origin[7]. Mitochondria are inherited solely from the mother and re- 
produce without recombination. Each mitochondrion carries multiple mitochondrial 
genomes(5 — 10) [8] and each cell contains hundreds to thousands of mitochondria, 
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depending on the tissue [9]. Inherited mutations in mtDNA have been linked to sev- 
eral genetic disorders including diabetes mellitus and deafness (DAD) and Leber's 
hereditary optic neuropathy (LHON)[10]. De novo mutations in mtDNA have also 
been linked to diseases[ll, 12, 13, 14]. 

Heteroplasmy, which is the occurence of multiples mtDNA haplotypes, has been 
documented in a variety of human tissues[15] and in samples from the 1000 genomes 
project[16]. Accurate determination of heteroplasmy, especially the low-frequency 
haplotypes, is needed for disease-association studies with mtDNA, as well as stud- 
ies of metabolic activity of cancer cells [17]. Deep sequencing is the only means 
to identify novel mtDNA haplotypes as well as somatic mutations in tissues and 
perform association studies to link the haplotypes to disease states. However, mea- 
surements of heteroplasmy are compromised by copies of large segments of mtDNA, 
called Nuclear-mtDNA pseudogene sequences (Numts), present in the mammalian 
nDNA[18] (Fig. S2). Thus, accurate determination of heteroplasmy requires pu- 
rification of mtDNA. Without purification, Numts contaminate the measurements 
of mtDNA variants, and introduce inaccuracies in the estimates of heteroplasmy, 
especially because Numts exhibit variability and occur in variable copy numbers 
similar to any other part of the nDNA. Isolating mtDNA has long been a chal- 
lenge. In forensics and genealogy, allele-specific primer extensions (SNaPshot) are 
used for genotyping mtDNA[19]. Hyper variable regions(HVR) in the D-loop have 
been amplified using PCR[20]. Entire mtDNA has been accessed using primers 
specific to mtDNA to either perform long-range PCR[21], or amplify overlapping 
fragments[15]. Isolation of organelles by ultra-high-speed centrifugation has also 
been used, though the yields are low along with contamination from fragmented nu- 
clear DNA[22]. Computational methods have also been used to infer heteroplasmy 
from whole-exome[23, 16] and whole-genome data[24], 

Heteroplasmy derived from PCR-bascd methods are error-prone, due to variabil- 
ity in amplification. Errors also arise from clonal amplification of variants arising 
from mistakes of polymerases, a common problem in PCR-amplicon sequencing. 
Additionally, sequence and copy number variations of Numts confound results from 
computational and PCR-based methods in unpredictable ways. Thus, none of the 
methods outlined above are able to accurately identify low-frequency variants in 
mtDNA. 

We present here Mseek, a novel method to enzymatically purify mtDNA by de- 
picting linear nDNA and inexpensively sequencing it. By applying Mseek to several 
cell-lines and human peripheral blood mononuclear cells (PBMC), we identified 
multiple mtDNA haplotypes in the samples. A major benefit of this method is the 
ability to call extremely rare variants, with sensitivity of calls only limited by the 
sequencing depth. Sequencing errors can also be overcome with more sequencing, 
which is not always possible, especially in PCR amplicon sequencing. Additionally, 
through clonal expansion of single cells from a variety of cell lines, we establish that 
heteroplasmy is stably maintained at a single cell level through multiple divisions. 
This suggests active intercellular exchange of mtDNA. This exchange is explicitly 
demonstrated by co-culturing two different cell-lines with distinct mtDNA haplo- 
types, labeling one cell line with GFP, and sorting the cells after many generations 
(> 25) to show mtDNA haplotypes unique to one cell-line selectively appear in 
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the other. These results, in conjunction with simulations, suggest that exchange of 
mtDNA between cells is a source of renewal and stability. 

Results 

Mseek: An efficient method to isolate and sequence mtDNA 

The potential relevance of mtDNA to many diseases requires a method to accu- 
rately determine the diversity of mtDNA in populations of cells. However, as noted, 
one of the major problem of existing approaches is the presence of nuclear DNA, 
which contains sequences of high homology to mtDNA (Numts), making it diffi- 
cult to discern mtDNA from nDNA (Fig. S2). To address this issue, we sought to 
take advantage of the difference in topology between nDNA and mtDNA using an 
exonuclease to digest the linear nDNA, while leaving intact the circular mtDNA. 
Total DNA was extracted from HEK 293T cells, and digested with exonuclease V 
or left undigested. To determine the outcome, we PCR amplified sequences specific 
to nDNA or mtDNA using appropriate primers. As expected, in the undigested 
samples of total DNA we could detect both nDNA and mtDNA (Fig.lA). In sharp 
contrast, in the samples treated with exonuclease V we could only detect mtDNA 
(Fig. IB). The lengths of the expected per products are shown in Fig. 1C. Using this 
approach, mtDNA was prepared and sequenced on the Illumina MiSeq platform. 
Out of a total of 3.05 million lOOnt reads, 1.233 million mapped to the mitochon- 
drial genome and 50,000 (< 2%) mapped to the nDNA. The remainder were adapter 
dimers, which are sequencing artifacts currently filtered out experimentally using 
Ampure beads. Over 98% of the mappable reads were derived from mtDNA with an 
average coverage > 3000X (Fig. ID). More than 50 distinct samples were processed 
similarly to consistently obtain high purity mtDNA sequence. 

The error rate per base of the reads is approximaely 1 in 1000 (Q score > 30). 
Using at least 10 non-clonal reads to make a variant call reduces errors from se- 
quencing to much less than 1 in a million. This coverage also allows removal of 
variants with a significant bias towards one strand, a known source of errors on the 
Illumina platform[25]. Contamination from the small amount of nDNA left in the 
samples does not contribute appreciably to the noise as Numts are a small fraction 
of total nDNA. Thus, calling rare variants to any level of sensitivity only depends 
on the depth of sequencing. This approach, designated Mseek (Fig. 2), provides 
a means of unmatched efficiency in accurately sequencing the mtDNA contained 
within a population of cells 

Ubiquity of heteroplasmy 

Since cell lines are clonally derived, the expectation is that the nDNA (and mtDNA) 
are identical across cells. We decided to explore the diversity of mtDNA in a variety 
of cell lines to test the expectation that mtDNA would be homoplasmic in cell lines, 
since, either a slight fitness advantage of one haplotype or drift [26] would lead to 
a clonal selection and homoplasmy. We applied Mseek to thirty samples including 
four human PBMCs and human cell lines derived from human diploid fibroblasts 
(501T), glioma (A382) and breast carcinoma (HCC1806 and MDA-MB-157). 

The mtDNA sequences were analyzed for variations in order to infer universal 
features in mtDNA variability and differences between human cell lines and blood- 
derived mtDNA. Repeat content of the sequences was computationally identified to 
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estimate nDNA contamination, which ranged from 0.5 — 1.5%; further confirming 
the specificity of Mseek. Importantly, because of this high degree of mtDNA purity 
(> 98%) we were able to multiplex all 30 samples in a single MiSeq run, with average 
coverage of > 100X. 

Variants with a frequency of either 0 or 1 in the population arise from homo- 
plasmic mtDNA. Intermediate frequencies between 0 and 1 imply the co-existence 
of multiple haplotypes in the population. Strikingly, in both cell-lines and human 
blood-derived mtDNA, we observed variants occurring in the 0.1 — 0.9 frequency 
range (Fig. 3), indicating that multiple haplotypes were present in the samples. The 
tool Mutation Assessor[27] was used to label the variants as high, medium, low, or 
neutral signifying their predicted impact on protein function. Cell-lines and human 
PBMCs did not exhibit putative deleterious mutations at high frequency, consistent 
with the expectation that functioning cells should have functional mitochondria. 

The mtDNA has a few non-coding regions outside of the D-loop which occur as 
gaps between genes. None of the samples exhibited mutations in these regions, sug- 
gesting an evolutionarily conserved role, such as in transcriptional control, for these 
regions. Each sample had unique, distinguishing mutations, ranging in frequency 
from 0.36 to 1.0. There were a number of unique variants in the four human PBMC 
samples (ranging in number from 5 to 15) and in the cell lines (ranging in number 
from 5 to 21). 

Since the cell lines were derived from a variety of tissues, our findings have some 
level of universality. There were no key distinguishing features between cell-line 
and human blood-derived mtDNA, in terms of deleterious mutations or degree of 
heteroplasmy, contrary to findings from a study based on whole-genome sequencing 
of TCGA samples[24]. Our findings are consistent with another study based on 
colorectal cancer[15]. 

Stability of heteroplasmy in cell-lines 

The results above indicate heteroplasmy exists within a cell population but do not 
establish heteroplasmy in individual cells, since a mixture of homoplasmic cells with 
different haplotypes would give the same result. In order to establish heteroplasmy 
in individual cells, we placed the severest possible bottleneck on the population by 
deriving colonies from single cells, utilizing MDA-MB-157 and U20S breast carci- 
noma and osteosarcoma lines respectively (Fig. 4). In each of the derived colonies (8 
colonies), the variants from the original lines remained in the derived colonies and 
at approximately the same frequencies as in the original tumor lines. The sharing 
of mutations between the original and derived colonies suggests that the diversity 
in mtDNA exists in individual cells. The preservation of the frequencies between 
the original and derived colonies indicates further that this heteroplasmy is uniform 
across cells in the original line (Fig. 4) . Since the new clonal lines underwent at least 
25 divisions from the single-cell stage, these results also suggest that heteroplasmy 
is stably maintained over multiple generations with no signs of selection or drift. 
Over many divisions, errors in replication should have increased diversity in het- 
eroplasmy, while small differences in fitness and drift should lead to homoplasmy. 
In fact, drift has been proposed as a mechanism for the selection of homoplasmic 
mtDNA mutations in tumors [26], which has been corroborated in other studies[15]. 
In light of these reports, our findings are quite unexpected. 
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A simple model of mtDNA genetics assumes random assortment of mtDNA hap- 
lotypes between daughter cells upon cell division, along with multiplication of mi- 
tochondria. This model would predict drift towards homoplasmy, as seen in our 
simulation of this process (Fig. 5) and by others[26]. The rate of drift in haplotype 
frequencies is a function of the number of mtDNA molecules per cell and the orig- 
inal frequency of the haplotypes (Fig. 5). After many passages, irrespective of the 
original mtDNA distribution, the likelihood of two randomly selected cells having 
the same heteroplasmic mix would be extremely low, which is at odds with the 
stable and uniform heteroplasmy that we observed in the clonally-derived cell-lines. 
This suggests the existence of an active mechanism to counteract this drift. 

Exchange of mtDNA between cells within a population is the simplest explana- 
tion for the uniformity of heteroplasmy and its stability. Exchange can counteract 
the effects of drift by bringing the haplotype distribution closer to the average of 
the distribution across cells within the population. Other explanations, such as a 
balancing selection[28] could also be invoked to explain the lack of drift. This can be 
discounted because most variants are neutral and specific to each cell line, suggest- 
ing the selection needs to be different for each cell line without an obvious selective 
pressure. 

Experimental demonstration of mtDNA exchange between cells 

In order to explicitly demonstrate the exchange of mtDNA between cells, we co- 
cultured cell- lines with distinct private haplotypes. Two sets of pairs including 
MDA-MB-157 and HCC as well as A382 and U20S were used. For each pair, one 
of the cell-lines was labeled with GFP (by transfection with a vector expressing 
GFP). After approximately 20 passages, the cells were sorted for the GFP marker 
by FACS, and mtDNA from the sorted cells were sequenced. The sorted cells were 
greater than 99% pure based on FACS. 

Tables 1 and 2 shows the results of sequencing mtDNA from these co-culture 
experiments. We detected variants private to one cell-line in the co-cultured partner 
cell-line, suggesting the transfer of mtDNA between the cell-lines. Not every private 
variant was transferred, arguing against the results arising from errors in sorting or 
cytoplasmic/nuclear exchange between cells. The purity of the sorted cells, based 
on FACS, further suggests that nuclear exchange does not account for the findings. 

Discussion 

Accurate sequencing of mtDNA is important for sensitive measurements of hetero- 
plasmy, whose variability can have clinical significance, as a biomarker and in disease 
progression [29]. Mseek provides a means to purify and deeply sequence mtDNA and 
determine heteroplasmy accurately by eliminating Numts and PCR-related biases. 
The sensitivity of Mseek is a function of sequencing depth alone. This is one of the 
most detailed and extensive survey of mtDNA from cell lines yet obtained. 

Accurate identification of variant frequencies is not possible through deep se- 
quencing methods currently in use. So far, deep sequencing approaches to mtDNA 
have used either long-range PCR[21, 30], or a multitude of mtDNA-specific PCR 
primers to amplify short overlapping mtDNA fragments (« 650 nt) which are lig- 
ated to each other, fragmented and prepared for sequencing [15]. Mining of wholc- 
genome[24, 31] and whole-exome[23, 16] data has also been used to identify mtDNA 
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fragments. A new approach uses methyl-specific endonucleases MspJI and AbaSI to 
deplete nDNA that is likely to be methylated[32]. A failing here is the Numts are 
not always methylated. PCR-amplicon biases, the inability to identify polymerase 
errors and contamination from Numts call into question the sensitivity of these 
methods to low frequency variants. The ability of Numts to confound analyses is 
highlighted by a study that used whole-genome data from the TCGA and inferred 
that deleterious mtDNA mutations are more common in cancer cells compared to 
normal tissue[24]. In contrast, findings of low mutations rates in tumor mtDNA 
from a colorectal cancer study [15] are more in line with our findings that cell lines 
don't exhibit higher rates of deleterious mutations compared to normal cells from 
human tissues. 

We have shown here that cells from a wide-range of cell lines and human sam- 
ples exhibit heteroplasmy, in accord with results from several studies[15, 16]. This 
suggests that heteroplasmy might be an essential feature of mtDNA. In fact, het- 
eroplasmy seems to provide a fingerprint that can identify cells. A larger survey 
is needed to understand the resolution of this fingerprint and its ability to distin- 
guish cellular origins. We found that mtDNA from transformed human cell-lines 
and primary human lymphocytes are similar with respect to the distributions of 
densities and frequencies of mutations (benign and deleterious ones). Non-coding 
gaps between mtDNA genes are highly conserved, indicating they might be control 
elements. 

Clonal amplification of cells does not lead to a selection of particular mtDNA 
haplotypcs, in fact, heteroplasmy is very stable, at least over the 25 or so divisions 
of cell lines that we have studied. This stability of heteroplasmy in cell lines is 
surprising in light of 1) the higher rates of mutation in mtDNA[33] which should 
increase the diversity of mtDNA, and 2) drift, which should lead to homoplasmy in 
about 70 generations [26, 15]. The stability of heteroplasmy against drift could arise 
from exchanges of mtDNA between cells which can be inferred from our cell-line 
data (Fig. 4) in conjunction with simulations (Fig. 5) and co-culturing experiments 
(Tables 1 and 2). The transfer of mtDNA seems to occur in a selective manner, 
suggesting either there are incompatibilities between the mtDNA haplotypes or 
between certain haplotypes and the nuclear genome. There is some indication that 
the amount of transfer increases over the number of passages of co-culture, based 
on our limited set of experiments, establishing this definitively requires a more 
long-standing experiment with sampling at different time points. A co-evolution of 
mtDNA and nDNA has in fact been suggested earlier [33]. This is also consistent 
with a study in mice that suggests that mitochondria from different species cannot 
co-exist [34]. The selective advantage of certain mtDNA haplotypes can additionally 
contribute to the stability of the mtDNA. 

The exact mechanisms of mtDNA transfer are not known. Horizontal transfer of 
genetic material between species of yeasts has been shown[35] and there is increas- 
ing interest in organelle transfer between cells through microtubule formation[36]. 
Within a cell, networks of mitochondria are created through fusion, mediated by 
fusin, which leads to the exchange of mtDNA[37]. This is necessary for functional 
mitochondria; knocking out fusin causes muscles to atrophy through the accumula- 
tion of deleterious mutations [37]. In vivo, exchanges of mitochondria between cells 
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has also been demonstrated in the rejuvenation of cells with damaged mitochondria 
by transfer of functional mitochondria from mesenchymal stem cells [38]. Rejuven- 
tation of cells containing damaged mtDNA by transfer of functional mtDNA from 
neighboring cells in culture has also been observed [39]. Ours is the first demonstra- 
tion of mtDNA transfer between cells with functional mtDNA. 

This is the first explicit demonstration of mtDNA transfer between cells with 
functional mtDNA whereas previous studies have shown transfer from cells with 
functional mtDNA into ones with non- functioning mtDNA [39, 38]. The proposed 
exchange of mtDNA between cells can explain its stability over the lifetime of 
an organism, and over generations, inferred from the relative lack of major age- 
related disorders originating in the mtDNA and the ability to infer geographic 
origins of a person from the mtDNA sequence. The stability of mtDNA against 
deleterious mutations could also be enhanced by a coupling between replication and 
transcription [40], ensuring the depletion of non- functional mtDNA by inefficiencies 
in their replication. 

By making mtDNA sequencing economical, Mseek enables large-scale studies of 
heteroplasmy for GWAS applications and clinical monitoring of mtDNA in tissues. 
The sequencing of mtDNA in cell-lines allows us, for the first time, to understand 
the nature of mtDNA variability and its maintenance in cell populations. There is 
great value in surveying large populations in order to establish the normal range 
of heteroplasmy for use in GWAS studies. The transfer of functional mtDNA into 
diseased cells could be used as therapy to treat disorders arising from mtDNA 
defects. Somatic mutations in mtDNA could play a role in various human disorders 
and in aging, especially when the transfer between cells is impeded and mechanisms 
involved in mtDNA transfer might be fruitful targets for therapeutic intervention. 

Methods 

Mseek 

We have developed a new method of isolating and sequencing mtDNA (Fig. 2). The 
results section contains details of its performance. Briefly, the method consists of 
the following steps, total DNA is isolated from the sample. The nDNA is digested 
using Exonuclease V. The products are purified using Ampure beads to remove 
short fragments. Using PCR primers specific to mtDNA and nDNA, the purity of 
the treated samples is tested (Fig. IB). Following this, the sample is fragmented 
using Covaris and end-repaired. Barcoded adapters compatible with the sequencing 
platform are ligated to the fragments. The universal adapters are used to amplify 
the library and prepare it for deep sequencing. 

Cell Culture 

mtDNA was isolated and sequenced from several cell lines including, 293T (a kidney- 
cancer derived cell line), U20S and Saos-2 (human osteosarcoma cell lines) and 
MDA-MB-157 (metastatic human breast cancer cell line). 

All cells were grown in Dulbecco's modified Eagle's medium (DMEM; Invitrogen), 
10% heat-inactivated fetal bovine serum (FBS; Invitrogen) and 50 U/ml penicillin 
and streptomycin (Pen/Strep; Invitrogen). Cultures were maintained at 37° C in 
5% C02. 
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Clonal isolation of tumor cells was performed by serial dilution into 96-well plates 
and visual examination of wells for single cells, which were then expanded for an 
additional 28-30 population doublings. 

Analyses 

Sequences that map to repeat elements (which occur only in the nDNA) allow 
reliable estimation of the level of nDNA contamination, which ranged from 0.5 — 
1.5%. 

MiST[25], a variant detection tool for whole-exome data, was used to call mtDNA 
variants. The reference mitochondrial genome has the accession NC_0 12920 from 
Genbank. The mtDNA annotations are from MITOMAP13, and SNP annotations 
are from dbSNP14. The error rate in Miseq and Hiseq reads are approximately 
1 in a 1000, so requiring at least 3 non-clonal reads to have the variant to make 
the call, reduces the error rate to well under 1 in a million. Variants with reads 
predominantly in one strand are excluded to firther reduce errors, based on our 
previous experience [25]. 

We developed a pipeline to assemble the mitochondrial genome from the deep- 
sequencing data, to demonstrate that the reads assemble into a circle and no large 
deletions, duplications or other large-scale structures were detected. 

Mutation Assessor[27] was used to assess the impact of mtDNA mutations on pro- 
tein function. This tool uses conservation of structure across orthologues to identify 
mutations in the DNA (and consequent changes in amino-acids) with potentially 
deleterious effects. The mutations are rated high, medium, low, or neutral based 
on their impact on protein function. We highlight the high and medium impact 
mutations in our graphs, as they may affect mitochondrial function. 
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Figures 



Figure 1 Performance of Mseek. (A) PCR products run on a 2% Agarose gel using primers for 6 

nDNA genes (OCT4, MUC, KLF4, SOX2, GAPDH and AR) and 5 regions of mtDNA. before 

exonuclease digestion. (B) After digestion, the nDNA bands disappear. (C) Sizes of expected 

PCR products. (D) Deep sequencing, read depth (y-axis) versus position on mtDNA (x-axis). 1.23 

million mtDNA reads and 50,000 nDNA reads implying > 98% pure mtDNA. 
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Figure 2 The Mseek protocol. 
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Figure 3 Novel variants in a cell-line (MDA-MB-157, circles) and a human (PBMC, crosses). 

Mutation frequency (y-axis) versus position on mtDNA (x-axis). Genes are colored bands at 
bottom of graph (+, — represent the strand, cds is coding sequence). Neutral, Low, Medium, and 
High are the effect of the mutation on protein function (Mutation Assessor[27]). Except for the 
D-loop, most of the mtDNA codes for a transcript, with a few gaps. The longer gaps (11, 24, and 
30 nt long) are marked by vertical red lines. The 45 nt long overlapping region between ATP8 and 
ATP6 is marked by black vertical lines. Mutation frequencies between 0 and 1 arise from the 
co-existence of multiple mtDNA haplotypes. Heteroplasmy at a cellular level is demonstrated in 
Fie. 4. There are no stark differences between the human and cell-line derived mtDNA. 
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Figure 4 Stability of heteroplasmy. A CI and C2 are colonies derived from single cells in the 
original colony, placing a severe botteneck on the mtDNA, and then passaged si 25 times. B 
Mutation frequencies in CI (x-axis) versus C2 (y-axis) for two cell-lines, U20S and MDA-MB-157. 
The mutations mostly lie along the diagonal; the heteroplasmic mix of mutations in the derived 
colonies are similar to each other. This implies the heteroplasmic mix exists at the single-cell level, 
and is stable over many divisions. A drift in frequencies is expected with random assortment of 
mtDNA haplotypes (simulations, Fig. 5). The stability of the frequencies implies active 
mechanisms to counteract the drift, such as the exchange of mtDNA between cells. 
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Figure 5 Simulations of mtDNA replication. Each cell contains a mixture of mtDNA haplotypes, 
here we consider only two species, red and blue, for the sake of simplicity. At cell division, the 
mitochondria assort randomly between the daughter cells and divide (with mtDNA replication) till 
a quorum of mitochondria, specific to each tissue-type, is reached. Quorum sensing is not 
well-understood, but the number of mitochondria per cell is tissue-specific[9] . The graphs show 
simulations of the evolution of heteroplasmy over time, based on this model. The distribution of 
haplotype frequencies spreads over time (drift), implying two randomly selected cells are unlikely 
to have the same mtDNA haplotype distribution. The plots show the distribution of frequencies 
after 1, 6, 11, 16, 21 and 26 divisions. The starting number of mitochondria per cell (N) is 250 in 
the upper panels (A,B) and 1000 in the lower panels (C,D). One of the alleles occurs with a 
frequency (p) of 0.3 in the left panels (A,C) and p=0.5 in the right panels (B,D). The drift is 
slower and the distributions narrower with larger N and smaller deviations of p from 0.5. 
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Table 1 Data from Mixing experiments of HCC cells co-cultured with MDA-MB-157 
for 6 weeks. The column private_to identifies the cell-lines that exhibit the variant. 
Rows highlighted in gray are cases where a variant unique to HCC has been 
identified in MDA-MB-157 cells co-cultured with HCC. The light green rows are 
variants private to HCC that did not transfer into MDA-MB-157. Rows highlighted 
in blue show variants common to both cell lines. For example, at position 3796 (row 
2), the A from the reference mtDNA genome is mutated to a I only in HCC, the 
MDA-MB-157 cultured with HCC exhibits an A, with a frequency of 0.14 ( or 
14%). f is the frequency and c is the coverage. 
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Table 2 Data from Mixing experiments of U20S co-cultured with A382 for 6 weeks. 
The column private_to identifies the cell-lines that exhibit the variant. The dark 
gray highlights rows where a variant unique ot A382 has been identified in U20S 
cells co-cultured with A382, only once case was identified, the variant at position 
315 (row 2). The A382 culture from the co-culture did not yield sufficient mtDNA 
for sequencing. The light blue rows are variants private to A382 that did not 
transfer into U20S. 
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Figure SI Organization of mtDNA. 13 protein-coding genes, 22 tRNA genes and 2 rRNA genes 
are encoded by a single circular nucleic acid and transcribed from three promoters: LSP (inner 
circle of genes - strand), HSP1 (outer circle of gene, + strand) and HSP2 (16S rRNA) on the 
D-loop, which is non-coding but critical for replication and transcription. The three polycistronic 
transcripts are processed by enzymatic excision of the tRNAs. There are a few small gaps (< 30 
nt) in annotation, and a 45nt overlap between ATP6 and ATP8 which might have roles in 
replication. 




Figure S2 Mappings of mtDNA tiles. The x-axis is the position along the mtDNA. Each band 
shows mappings of n-mer tiles (n = 36, 40, 50, 75, 100) from mtDNA on the human genome. 
Darker regions are more unique. Even 100 nt tiles from mtDNA often map to nDNA; thus, 
unambiguous identification of mtDNA variants requires efficient isolation of mtDNA. The 
pseudo-gene copies of mtDNA in the nuclear genome are called nuclear-mtDNA pseudogene 
sequences or Numts. 
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